NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cascaded Dimension Reduction for Effective Anomaly Detection

https://doi.org/10.1109/BigData52589.2021.9671364

Zhuo, Xiaoyan; Moon, Aekyeung; Zhang, Jialing; Son, Seung Woo (December 2021, IEEE International Conference on Big Data (Big Data))

Full Text Available
DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data

https://doi.org/10.1109/Cluster48925.2021.00056

Zhang, Jialing; Chen, Jiaxi; Zhuo, Xiaoyan; Moon, Aekyeung; Son, Seung Woo (September 2021, IEEE International Conference on Cluster Computing (CLUSTER))
null (Ed.)
Full Text Available
Anomaly Detection in Edge Nodes using Sparsity Profile

https://doi.org/10.1109/BigData50022.2020.9377757

Moon, Aekyeung; Zhuo, Xiaoyan; Zhang, Jialing; Son, Seung Woo; Jeong Song, Yun (December 2020, 2020 IEEE International Conference on Big Data (Big Data))
null (Ed.)
Edge devices with attentive sensors enable various intelligent services by exploring streams of sensor data. However, anomalies, which are inevitable due to faults or failures in the sensor and network, can result in incorrect or unwanted operational decisions. While promptly ensuring the accuracy of IoT data is critical, lack of labels for live sensor data and limited storage resources necessitates efficient and reliable detection of anomalies at edge nodes. Motivated by the existence of unique sparsity profiles that express original signals as a combination of a few coefficients between normal and abnormal sensing periods, we propose a novel anomaly detection approach, called ADSP (Anomaly Detection with Sparsity Profile). The key idea is to apply a transformation on the raw data, identify top-K dominant components that represent normal data behaviors, and detect data anomalies based on the disparity from K values approximating the periods of normal data in an unsupervised manner. Our evaluation using a set of synthetic datasets demonstrates that ADSP can achieve 92%–100% of detection accuracy. To validate our anomaly detection approach on real-world cases, we label potential anomalies using a range of error boundary conditions using sensors exhibiting a straight line in Q-Q plot and strong Pearson correlation and conduct a controlled comparison of the detection accuracy. Our experimental evaluation using real-world datasets demonstrates that ADSP can detect 83%– 92% of anomalies using only 1.7% of the original data, which is comparable to the accuracy achieved by using the entire datasets.
more » « less
Full Text Available
Bit-Error Aware Quantization for DCT-based Lossy Compression

Zhang, Jialing; Chen, Jiaxi; Moon, Aekyeung; Zhuo, Xiaoyan; Son, Seung Woo (September 2020, IEEE Conference on High Performance Extreme Computing)

Scientific simulations run by high-performance computing (HPC) systems produce a large amount of data, which causes an extreme I/O bottleneck and a huge storage burden. Applying compression techniques can mitigate such overheads through reducing the data size. Unlike traditional lossless compressions, error-controlled lossy compressions, such as SZ, ZFP, and DCTZ, designed for scientists who demand not only high compression ratios but also a guarantee of certain degree of precision, is coming into prominence. While rate-distortion efficiency of recent lossy compressors, especially the DCT-based one, is promising due to its high-compression encoding, the overall coding architecture is still conservative, necessitating the quantization that strikes a balance between different encoding possibilities and varying rate-distortions. In this paper, we aim to improve the performance of DCT-based compressor, namely DCTZ, by optimizing the quantization model and encoding mechanism. Specifically, we propose a bit-efficient quantizer based on the DCTZ framework, develop a unique ordering mechanism based on the quantization table, and extend the encoding index. We evaluate the performance of our optimized DCTZ in terms of rate-distortion using real-world HPC datasets. Our experimental evaluations demonstrate that, on average, our proposed approach can improve the compression ratio of the original DCTZ by 1.38x. Moreover, combined with the extended encoding mechanism, the optimized DCTZ shows a competitive performance with state-of-the-art lossy compressors, SZ and ZFP.
more » « less
Full Text Available
AD ² : Improving Quality of IoT Data through Compressive Anomaly Detection

https://doi.org/10.1109/BigData47090.2019.9005533

Moon, Aekyeung; Zhuo, Xiaoyan; Zhang, Jialing; Son, Seung Woo (December 2019, IEEE International Conference on Big Data (Big Data))

With recent technological advances in sensor nodes, IoT enabled applications have great potential in many domains. However, sensing data may be inaccurate due to not only faults or failures in the sensor and network but also the limited resources and transmission capability available in sensor nodes. In this paper, we first model streams of IoT data as a handful of sampled data in the transformed domain while assuming the information attained by those sampled data reveal different sparsity profiles between normal and abnormal. We then present a novel approach called AD2 (Anomaly Detection using Approximated Data) that applies a transformation on the original data, samples top k-dominant components, and detects data anomalies based on the disparity in k values. To demonstrate the effectiveness of AD2 , we use IoT datasets (temperature, humidity, and CO) collected from real-world wireless sensor nodes. Our experimental evaluation demonstrates that AD2 can approximate and successfully detect 64%-94% of anomalies using only 1.9% of the original data and minimize false positive rates, which would otherwise require the entire dataset to achieve the same level of accuracy.
more » « less
Full Text Available
Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets

https://doi.org/10.1109/HPEC.2019.8916286

Zhang, Jialing; Moon, Aekyeung; Zhuo, Xiaoyan; Son, Seung Woo (September 2019, IEEE High Performance Extreme Computing Conference (HPEC))

As the size and amount of data produced by high-performance computing (HPC) applications grow exponentially, an effective data reduction technique is becoming critical to mitigating time and space burden. Lossy compression techniques, which have been widely used in image and video compression, hold promise to fulfill such data reduction need. However, they are seldom adopted in HPC datasets because of their difficulty in quantifying the amount of information loss and data reduction. In this paper, we explore a lossy compression strategy by revisiting the energy compaction properties of discrete transforms on HPC datasets. Specifically, we apply block-based transforms to HPC datasets, obtain the minimum number of coefficients containing the maximum energy (or information) compaction rate, and quantize remaining non-dominant coefficients using a binning mechanism to minimize information loss expressed in a distortion measure. We implement the proposed approach and evaluate it using six real-world HPC datasets. Our experimental results show that, on average, only 6.67 bits are required to preserve an optimal energy compaction rate on our evaluated datasets. Moreover, our knee detection algorithm improves the distortion in terms of peak signal-to-noise ratio by 2.46 dB on average.
more » « less
Full Text Available
Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart

https://doi.org/10.1109/MSST.2019.00-14

Zhang, Jialing; Zhuo, Xiaoyan; Moon, Aekyeung; Liu, Hang; Son, Seung Woo (May 2019, 35th Symposium on Mass Storage Systems and Technologies (MSST))

As the amount of data produced by HPC applications reaches the exabyte range, compression techniques are often adopted to reduce the checkpoint time and volume. Since lossless techniques are limited in their ability to achieve appreciable data reduction, lossy compression becomes a preferable option. In this work, a lossy compression technique with highly efficient encoding, purpose-built error control, and high compression ratios is proposed. Specifically, we apply a discrete cosine transform with a novel block decomposition strategy directly to double-precision floating point datasets instead of prevailing prediction-based techniques. Further, we design an adaptive quantization with two specific task-oriented quantizers: guaranteed error bounds and higher compression ratios. Using real-world HPC datasets, our approach achieves 3x-38x compression ratios while guaranteeing specified error bounds, showing comparable performance with state-of-the-art lossy compression methods, SZ and ZFP. Moreover, our method provides viable reconstructed data for various checkpoint/restart scenarios in the FLASH application, thus is considered to be a promising approach for lossy data compression in HPC I/O software stacks.
more » « less
Full Text Available
Evaluating fidelity of lossy compression on spatiotemporal data from an IoT enabled smart farm

https://doi.org/10.1016/j.compag.2018.08.045

Moon, Aekyeung; Kim, Jaeyoung; Zhang, Jialing; Son, Seung Woo (November 2018, Computers and Electronics in Agriculture)

Full Text Available

Search for: All records